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Abstract 

Ibc present project provided for the training of teachers and aides for 
a suitDier Head Start program which employed a modified version of the 
language curriculum developed by Bereiter and Engelmann. A. preliminary 
assessment of the curriculum was also completed in respect to the level 
of achievement realized by the students, using the School Readiness 
Tasks by Barbara Bateman as the dependent variable, in respect to abso- 
lute norms and in comparison with other Head Start pupils that had • 
received a different adaptation of the language curriculum. 

The curriculum V7as also examined for evidence of an internal, hierar- 
chical structure among the specific skills that comprise the objective 
of the curriculum, and was further analyzed for correlations with a 
series of analogue tasks that were specifically developed for the pur- 
pose of assessing at least some of the same cognitive characteristics 
that weye thought to be fostered by the curriculum. 
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General Introduction 



The University of Hawaii Head Start Evaluation and Research Center has 
continued to explore the Bereiter^Engelinann teaching strategy* During 
1966-67, Center personnel developed detailed curricular materials that 
were based upon the Bereiter-Engelraann curriculum but intended for use 
under local conditions, and tried out and revised the materials in 
preliminary form. During the provisional trials, frequent sessions with 
the teachers were conducted for the purpose of gaining feedback con- 
cerning the adequacy of the materials and procedures. Many class ses- 
sions were video taped. 

In a broad sense, a primary purpose of the present project was to 
provide continuity for further work in 1968—69 x-7ith the curriculum that 
had been evolving during 1966-67. In particular, if the materials 
to be used at the beginning of regular Head Start programs in 1967-68, 
supervised trainiiig of teachers who were likely to be employed in 19o/-68 
and a limited tryout of the first sections of a manual seemed essential. 
For this reason, a special training program was conducted in the sunmer 
of 1967 for six summer Head Start teachers and six aides. Following the 
training, they used the beginning sections of the manual with simimer 

classes. 

Consideration of the Bereiter-Engelraann strategy led to some additional 
questions which could be explored in a preliminary X’7ay at the same time 
the new manual materials were being used with summer Head Start classes. 



The 1967 Summer Tcalning Sessions on a Language Curriculum 
Selection of teachers and aides . 

The local Head Start Evaluation and Research Center had a number of 
applications from teachers for Head Start teaching positions. From this 
pool of applicants, seven teachers who were highly rated by their peers • 
XfeTQ selected for training during the summer project. Six aides, un~ 
skilled members of the community, x^ere hired under the provisions of CAP 
organizations. 

Content of instruction . 

The major content for the summer training of teachers and aides focused 
upon a formal presentation of the material presented by Bereiter and 
Engelmann (1966), but was supplemented by other relevant articles. The 
following topics were covered during the training: 

1, The specific objectives of the summer project. 

2. The organizational problems in the administration of the suirirer 
program. 



3. llie language problems of the disadvantaged child. 

4. The Instructional language program as developed by Berelter and 
Engelmann and Its Implementation In a normal nursery school 
setting. 

5. The use of contingency management, as developed by Lloyd Homme 
(1966) > and reinforcement schedules. 

6. Paper work: The coding forms which had been developed to deter- 

mine Individual progression through the sequence of grammatical 
forms. 

7. Implementation of the parent education program, and discussion 
of Its alms and scope. 

8. The utilization of aides In the classroom. Incorporating the 
concept of the aide as a teaching person. 

Training of teachers and aide s . 

The teachers met dally for a week and on Tuesday and Thursday the aides 
and teachers met together. In the discussions which followed the presen- 
tation of the program and methodology, emphasis was placed on the concept 
of the aide as a teaching person who would reinforce the sentence pat- 
terns Introduced by the teacher. Video tapes were helpful during tbcoe 
sessions In providing concrete episodes to Illustrate program development 
and child behavior. 

Follow-up sessions . 

Teachers and aides met once weekly with separate discussion leaders to 
discuss: 

1. Problems relating to child behaviors. 

2. Problems relating to the vertical progression of the chl3*^*?:i 
Individually and In groups. 

3. Questions relating to the part that the aides could play In the 
teaching process. 

4. The use of volunteer mothers. 

5. The success or failure of various types of teaching techniques 
and lesson planning. 

6. Problems related to the use of reinforcement only In conjunc- 
tion with the language curriculum. 

7. Problems encountered In shifting from material to token rewards. 

2 
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Outcomes . 



On the basis of teacher discussion, it was possible to incorporate many 
suggestions into the manual which was in the process of development 
during the sunmer and which was to be tihe basiB for an eKperimental treat- 
ment in 1967-68. Methods of coding were systematized and ideas for a 
good teacher observation form were developed. It was decided on the 
basis of the summer experience that aides should not supplant the 
teacher in the language sesions but that the regular teacher should be 
responsible for this aspect of the program. On the basis of teac er 
and aide comments, it was decided to provide materials in the manual 
for two other semi-structured activity groupings to be carried on during 

the teaching hour. 



Part of the 1966-67 research program consisted of the development of a 
series of T.V. tapes in order to obtain a record of the experimental 
procedure and content. During the sumner training session for teachers 
and aides, the tapes were tjsed in order to demonstrate the 
procedure and to show deviant behavior and how it could be handled dur- 
ing the structural language sessions. Since the tapes proved to be 
useful as a training device, analysis of all the tapes was begun 
the summer with a view to producing one single 45-minute track for trai 
Ing new teachers entering the program in the fall. 



The following guidelines were proposed; 

1. The purpose oE the tape would be to demonstrate this instruc 
tional approach to a preschool language curriculum. 

2. It would include examples of the major steps in the sequence 
of the curriculum from the beginning program through the 
advanced program. 

3. Emphasis would be placed on the need to eliminate extraneous 
stimuli. 

4. It would emphasize the use of the systematic reward schedule, 
only with the language curriculum. 

5. It would Include examples of variations in the physical set- 
up of the room, of teaching strategies, and of dialogue and 

tandem teaching. 

6* Teaching segments which Illustrated the need to Include tasks 
of varying level within the individual language lesson would 
be noted. 

It was also suggested that in order to stimulate critical thinking and 
discussion d)Out the instructional approach, segments should be included 
to illustrate problem areas. Including those related to individual ver- 
sus group participation and disciplinary problems. 



o 



3 



Problems Related to the Evaluation of the Language Curriculum 
Background . 

Bereiter (1967) has suggested that much of the failure of education has 
been due to the unwitting adoption of a model that leads to a statement 
of broad curricular objectives. The problems in assessing attainment of 
broad objectives, however, lack sufficient definition to permit solu- 
tion. On the other hand, problems in measuring attainment of very 
narrowly defined objectives tend to be more specific and limited, and 
consequently more amenable to solution. It is argued that if global 
objectives are partitioned, the problems of education will be resolved 
within the context of the limited objectives, leading to greater educa- 
tional gains. 

At times such an approach has been refined to the point where the 
behavior elicited in the training situation is, in fact, the objective 
of instruction. This has been the case, for example, in some linear 
programming. In these instances, presumably, the behavior can be sys- 
tematically reinforced in the stimulus context in which the behavior is 
desired, with the result that behavior in the future will be more 
^^appropriate to the situation# 

As educational objectives become more specific, they become Increasingly 
congruent with educational activities. The Instructional activities 
tend to be ”to the point,” and it is possible to determine whether or 
not success has been achieved* With specific behavioral objectives, it 
is a relatively simple matter of observation to determine whether or not 
they have been realized. For objectives couched in terms of general 
characteristics, a more complicated problem of determining criteria of 
success confronts the Investigator. 

This general trend to specificity in curricula has carried over into 
compensatory education (e.g., Bereiter and Engelmann, 1966). In this 
case, however, more is required than a relatively simple refinement of 
general objectives. It is necessary first to define the objectives. 

The culturally deprived are diflcient in nearly every area of ability 
that is susceptible to objective assessment. They have lacked the type 
of intellectual environment that Is characteristic of children from the 
middle and upper social echelons. It would be possible to gear a pro- 
gram of compensatory education to any one of these many differences, 
and indeed programs have been directed to many of them. But at this 
point it is productive to recall Gulllksen*s (1950) distinction between 
intrinsic and extrinsi c correlates of success in education. Since our 
primary concern is with programs that will enable the educationally 
disadvantaged to compete better in an academic situation, it is neces- 
sary to focus attention upon those specific characteristics or deficits 
that are intrinsically related to academic achievement. 
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One of the salient characteristics of culturally disadvantaged children 
Is their severe deficit In language skills. Since they are also re- 
tarded in general educational development, these two characteristics are 

at least related. Since much of the curricula Is . _ 

language and since the deficit In language skills antedates formal edu 
catlonal activities, it seems reasonable that remedial efforts should be 
directed early toward overcoming the language deficiency. 

The concept of language skills In the present conteKt lacks precise 
deflStl^rand coufd lover a variety of activities as word na^ng, 

word rhvmlnp, or oral reading. Berelter and Engelmann fociu^ more upon 
the characteristics of sentences and 

Sion. They contend that In language training the child learns concepts 
aru rules governing the manipulation of concepts; once the formal 
characteristics of language are acquired the child 

that will foster the acquisition of new knowledge and lead to the soiu 
tlon of logical problems. 

Specific problems, . 

The objectives of the Berelter-Engelmann language curriculum are specif- 
ically defined In that efforts are concentrated upon 

language skills and facility with basic sentence ^age. But they are at 
the same time general in that specific behaviors fostered In the in- 
structional setting are not the ultimate goals, but . 

the development of skills that will have their 

situations outside of the Instructional setting. The child who learns 
the logic of language that makes inference possible should be more in 
telllgent, at least In a limited sense, as a result of Instruction. 

To the extent that a curriculum seeks to foster specific beha^^ors, 
assessment lies primarily In a structured observatl^ of behavior In 
order to determine whether or not the desired behavior occurs 
If led situation. An evaluation of the utility 

would Involve a determination of the extent to which the behaviors could 
be observed In comparable children who had not had the specific tralnli.^,. 
It would be hoped, naturally, that children who had received the t.aln- 
Ing would eadilbit particular skills to a greater degree than children 
not having had the benefit of the training. 

A secondary problem related to specific attainment within the present 
curriculum is the extent to ^Ich it has a hierarchical structure. The 
cognitive skills cultivated by the curriculum have not ^^en regarded as 
Independent. On the contrary. In many Instances the 
skill has been considered prerequisite to the attainment of another. 

The skills have been ordered In terms of Increasing complexity and dif- 
ficulty as this Is revealed by logical analysis , but so far Ixttle is 
known concerning the Interdependence of the skills. 

To the extent that the attainment of one skill Is prerequisite to that of 
another, the statistical characteristic of the performance of tae sub- 
jects should be that an Item measuring a higher level of achievement 
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will be passed only if all items at a lower level have been passed. 

Thus passing a higher item implies passing all lower items. Gxven 
variance among the items, assuming a more or less constant amount of 

per child, the means of the items should be ordered in accord 
with the ordering of the items dictated by the vertical structure 5 that 
is, the means should descend in magnitude. Certain expectations re- 
garding the correlations among the items also will be taken up later. 

With more general objectives, which are also applicable with the 
Bereiter-Engelmann curriculum, additional methods for evaluation are 
needed. The tendency in this case has been to rely upon Indirect 
assessment of the desired characteristics through the use of standard- 
ized tests. 

Accumulating evidence indicates that changes do occur in test perfor*“ 
mance on various tests of intelligence and cognitive functioning as a 
result of participation in special educational programs, but the inter- 
pretation is extremely difficult. Three well-known factors which 
mitigate against a straightforward interpretation of increases in test 
pei'ormance are the effects of regression, retesting, and maturation. 

These problems to a degree can be circumvented with appropriate statis- 
tical control, although few studies reflect the necessary randomization 
of subjects or testing conditions. A further problem in the interpre- 
tation of apparent gains is far more resistant to appropriate control. 
This problem arises from the possibility of cultivating subject charac- 
teristics that encourage better test performance but contribute neither 
to a heightened level in the characteristic the test is intended to 
measure nor to better performance on the sorts of tasks that the test 
is intended to predict. Such characteristics are extrinsic to the 
validity of the test. Of many characteristics that may contribute to 
performance on a test, only one may correlate highly with performance 
on criterion tasks. While theoretically the square root of the reli- 
ability of a test sets the upper limit to its validity, customarily 
reliabilities greatly exceed validities. Much of the difference could 
indeed be due to the assessment of characteristics not intrinsically 
related to criterion performance. 

The development of rapport during the first months of schooling, for 
instance, may be reflected in better rapport and increased scores on an 
intelligence scale. Educational experiences contributing to task per- 
sistence might very well be reflected in a testing situation. A program 
that elicits a greater degree of pupil activity , both motor and verbal , 
may result in more relevant beha'vior in a testing situation, without 
actually contributing to the relative strength of specific characteris- 
tics that the test is intended to measure. But it is not so important 
to catalog as to illustrate the possibility of developing some form of 
"test-wiseness” that is extrinsic to the validity of a test. 

While gains on standardized tests may be taken as support for the claim 
that curriculum contributes to the development of general skills, they 
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certainly fall f"-ort of an adequate demonstration. Ideally 9 then, the 
assessment of general characteristics would involve specific behavioral 
observation in a situation so structured that the desired characteris- 
tics would be unambiguously manifested in the subject’s performance. 

The foregoing consideration of problems related to the type of curric- 
ulum being developed, and for which teachers and aides were to be 
prepared, suggested the following questions to be explored, at least in 
a preliminary way, using data from the summer classes: 

1. Do children trained by procedures dictated by the Bereiter- 
Engelmann approach actually attain the characteristics 
ostensibly cultivated by the curriculum? In the present case, 
the curriculum minimally attempts to develop a general 
facility in the use of language. Do the children attain even 
a specific ability in the use of language? 

2. Do the techniques employed with the present curriculum consti- 
tute an efficient means toward the attainment of the objectives 
of the curriculum? 

3. To what extent is there experimental confirmation of a hierar- 
chical structure in the curriculum? 

4. Does the curriculum result in the development of generalizable 
skills? An Important objective of the instruction is the de- 
velopment of logical skills that are presumably fostered by a 
mastery of the language. Can situations be developed that 
would provide for the demonstration of general skills? 

The following sections will be addressed to these specific areas of 
concern. 

Procedures 

Subjects. The teachers who were trained under the provisions of ^the 
presSit study were assigned to six summer Head Start classes in four geo- 
graphic areas on Oahu, and the pupils within these classes were used ai; 
the subjects for the present study. For the purpose of providing some 
meaningful comparisons, these subjects were divided into two groups on 
the basis of their geographic location, group 1 comprising those pupils 
from Pearl City, Waimanalo, Palolo, and Honolulu (H-28), and group 2 
consisting of those from the Kallhi area (N»21) . All were taught the 
experimental language curriculum that has been under development at tun 
Univeis. ty of Hawaii Head Start Evaluation and Research Center. 

It was originally intended that additional pupils from among the remain- 
ing Head Start classes would be selected to provide an informal control 
condition for the purpose of assessing the utility of the curriculum 
under study. It was subsequently learned, however, that the state OEO 
office had provided all of the summer Head Start teachers with a cnrrlc- 
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ulum guide that, although not based exclusively upon the Berelter- 
Engelmann curriculum, was substantially influenced by this approach to 
language development. The main characteristic that differentiate t e 
curriculum provided for use with the remaining classes was its scope; 
the pupils were given training in virtually all areas of the Berelter- 
Engelmknn curriculum that were subsequently tested, although the inten- 
sity of the training on that portion covered in the experimental 
classes was not commensurate with that provided by the experimental 
curriculum. 

Because of general differences in ability that occur among children 
within the various geographic areas, only children from one area, Kallhi, 
where there was a sufficiently large population to provide subjects for 
both conditions, were included in the present study for the basis of 
providing the sort of Informal assessment of the experimental curriculum 
that could result from comparisons of achievement among the pupils. It 
should be emphasized that the children used for comparison purposes, 
group 3 (N«20), although selected from the same geographic area as the 
children in group 2, were not randomly assigned to this condition. Any 
comparisons made are necessarily inconclusive, and are Intended to be 
suggestive? only. 

Analogue Task Development. An Important objective of the present Inves- 
tigation was the development of some Independent means for assessing 
outcomes of the language curriculum under development. An approach that 
seemed worth while was to attempt to develop a series of tasks that 
would require the same cognitive skills as those thought to be developed 
by the language curriculum, but tasks that could be administered non- 
verbally. 

Among those tasks used in experimental learning studies, the procedures 
associated with discrimination learning seemed most suitable, for they 
permit a motor response rather than a verbal response and the discrimi- 
nations can require fairly sophisticated concepts. An apparatus was 
built along the lines of the Wisconsin Test Apparatus, with two compart- 
ments over which stimulus cards can be placed. The stimulus dimensions 
were restricted, initially, to variations in color for the elementary 
problems, and color and shape for the more complex problems. With these 
conditions, it was possible to develop a series of tasks that appeared 
to be analogous to many of those in the curriculum. 

1. Black positive, other colors negative. This task requires a 
consistent response to a particular stimulus and consequently 
is similar to an identity statement. It does in fact require 
that the subject Identify one particular stimulus within the 
set of stimuli employed. 

2. Black negative. This is a simple reversal of the above task; 
it is analogous to a negative statement to the extent that 
"not black" defines the concept that would be reflected in 
criterion performance. 
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3. Same tasks as above, but with one Irrelevant dimension, shape, 
added. 

4. Simple conjunction. Either of tv7o attributes, one color and 
one shape, is positive. 

5. Conjunction. Two specific attributes, one color in combination 
with one shape, constitute a positive instance. 

6. disjunction. Either one of two attributes is positive, but not 
in combination. 

Following their initial development, the tasks were tried out with a 
number of the children in order to test their suitability for the sample 
under study. The children were told only that they were going to play 
a game, that each time the apparatus was revealed they were to pick up 
one of the cards and if they found a piece of candy they could keep it. 
The most significant thing to emerge from these preliminary trials was 
that the children were extremely variable in their performance. Some 
children very quickly reached a criterion level of performance on the 
first task and very readily made the reversal required by the second. 
Other children, however, continued fifty trials or more without reaching 
criterion on the first task. 

The major question related to these findings was whether the children 
who failed to evidence learning of the first problem failed because of 
the difficulty of the concept that was presented (which was a simple 
identity) or whether their failure was simply an artifact due to the 
experimental procedure. 

In order to test the first hypothesis, that the variability was due to 
differences in the ability of the children to learn the concept, addi- 
tional subjects were employed for the same tasks, with the exception 
that in this case the rule was specified verbally, e.g., '’The candy is 
always under the black one.” Given the verbal rule, the children re- 
flected very little variability in performance, coining through all 
trials pretty much error free. This suggested strongly that the vari- 
ability of the students in the first series had been due to something 
other than the cognitive skill thought to be relevant; for the subjects, 
provided the rule, could apply it unerringly. 

In reexamining the procedure used with the first group of children, it 
was observed that some children, in a real sense, had reflected a set 
for the type of task that was pr( seated; that is, they behaved as if 
they expected to find some rule of correspondence between the stimulus 
and the occurrence of the candy. For those children, anything less than 
perfect performance would provide information concerning the inadequacy 
of their rule, which would subsequently be modified and retested. Other 
children, without such a set, x^ere reinforced on a 50% schedule for 
participating in the game, and, as a matter of fact, since they were 
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continued for many more trials, received more candy during the duration 
of this experiment than did those who were successful In the discrimi- 
nation. 



On the assumption that the application of a rule In a specific situation 
implies the presence of the particular cognitive skills ® 

the attainment of the rule, several of the analogue tasks were subjected 
to further test under the verbal procedure, that Is, ^Tith the rule 
stated. It x^as found that there was very little variance in the per- 
formance of the subjects under these conditions so little, as a matter 
of fact, that the first set of tasks seemed Inappropriate for the pur- 
nose Intended; very nearly all of the subjects manifested a fairly high 
degree of facility with the concepts of identity, negation, conjunction, 
and disjunction. The striking characteristic that served to differen- 
tiate among the members of the sample, on the other hand, was a set 
toward the formation of concepts or rules of correspondence. 



The level of proficiency of the subjects, at least as manifested in the 
types of tasks that were used, generally exceeded that level of skill 
that was being trained In the summer curriculum, but not necessarily the 
cognitive skills developed by the curriculum generally. Additional 
tasks were developed, therefore, that sampled more broadly the cognitive 
skills that are ostensibly cultivated by the complete language curricu- 
lum. 



The series of tasks subsequently developed can be divided Into three 
sets: one requiring the Identification of five colors, three shapes, 

and two sizes; a second requiring a response based upon these character- 
istics; and a larger set in x^hich two characteristics are correlated In 
a display while only one characteristic is revealed in a test situation. 
In these Items, the correlation of characteristics should provide the 
basis for an Inference concerning the ’’missing" characteristic. In 
anticipation of further attempts to devise a technique for evaluating 
the curriculum, each of the tasks was constructed In such a fashion 
that It could be used In concept learning studies. Again the first 
concern was with the level of difficulty of the tasks (the ability to 
discriminate among the members of the sample under study) and the cor- 
respondence of the tasks to those abilities fostered by the curriculum. 

The analogue tasks were subsequently administered at the same time as 
the School Readiness Tasks In order to get additional Information con- 
cerning the characteristics of the tasks. 

Testing, l-ftich of the time during the actual conduct of the summer Head 
Start classes was given over to the development of standard criteria for 
the assessment of the pupil’s level of achievement. 
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The School Readiness Tasks (SRT) by Barbara Batemanl, since It was spe 
clf lMU7"deslgned for this purpose, was adopted as a criterion of 

achlevemLt for the curriculum. Followlns the ^ 

nunlls from groups 2 and 3 attended common classes, thus receiving a 
"c^n po" IrSent. and were dra,«. from these 

Of testing both with the SRT and the analogue tasks. Considerable 
delay was^reallzed in making necessary arrangements for the testing to 
take%lace; consequently the final testing did not take place until 
about tiTO months follox^rlng the beginning of the regular kindergarten 

class . 

Analysis The SRT tests were scored according to the criteria that had 
^^;io^d by the university of H^ail Head 

Research Center, and scores for both emerging an ? . «-vi «• -i-f 

performance were recorded for each S'll^ject with the provlsl^^ 
the subiect was judged adequate on any task he was also 
emL'inf 5Lerof achievement. The sum of these two scores, then, 
prodfcfd ^hree-polnt rating scale of the students* level of achieve- 
Lnt on each BateLn task. The scoring of the perfon^nce of the 
subjects on the analogue tasks proceeded in a binary ^ 

subject receiving a score of one for each problem respm 
Intercorrelations among all of the items as well as 
total scores were computed, as well as means and standard devia_ 

each Item, sub— scale, and total score. 



Results 

The results of the investigation are presented in four sections corres- 
ponding to the four problems that were presented in the preceding 

sections. 

student Achievement . The experimental curriculum presumably varies in 
two dimensions, horizontally in the sense that 

presented for each particular linguistic form, and vertically, Indlcat. 
In^that various linguistic forms (basic sentence structures) are 
prisented L the ordir of increasing complexity. The SRT 
the presentation of materials in conjunction x-rith questions that requxre 
answers in the grammatical forms that comprise the curriculum. A 
criterion of adequacy is applied to the responses, and the subject is 
rated on a three-point scale, 2 for adequate, 1 0 for 

Inadequate, in respect to each of the particular skills. If all stu 
dents pass the criteria for success at some particular level, the mean 

for the task is 2.00. 

Summary statistics for the testing with the Bateman are , . 

Table 1. Only the first 6 tasks, through prepositions, were included xn 



^Experimental Edition, unpublished mimeographed form. University of 
Illinois • 
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the sunder curriculum for groups 1 and 2, so assessment for this portion 
should be restricted to these skills. All of the pupils in groups 1 and 
2 performed to criteria on the first two tasks (la and lb). They were 
uniformly successful in their use of first-order identity st&tements> 
e.g. , “This is a and "This is not a ." With the excep- 

tion of one pupil, all also were adequate in their use of second-order 

descriptive statements (2a and 2b) of the form "This is 

and "This is not ." The criterion for adequacy for the 

fifth task requires a polar deduction for each of four polar opposites, 
big- lit tie (or small), down-up, long-short, and fat-thin (or skinny), 

e.g., "This box is not big. It is Only 12% of the subjects 

responded correctly to all of these items, although 86% responded cor~ 
rectly to at' least 2 of the 4 items. This is somewhat surprising in 
light of the fact that training was ostensibly given on all of these 
pairs. It may be that since for most classes this was the terminal 
level of Instruction during the summer program, training at this level 
was not complete. 

Comparison with Control Subjects . As previously mentioned, the term 
"control" is used only in a casual sense in the present study. The stu- 
dents could not be randomly assigned to classes, and there was in fact 
no control exercised over the curriculum offered to the children in the 
other classes. The teachers who taught the control classes had been 
provided with a version of the Bereiter curriculum that was condensed 
but that had wider coverage. Further, the control subjects (group 3) 
were confined to one geographic area (Kalihi) , whereas the remaining sub- 
jects were drawn from four different areas, only those in group 2 coming 
from the same area. These qualifications are Important in light of the 
fact that a comparison between experimental and control subjects shows 
that over all tasks. Including those that were not specifically included 
in the summer curriculum, the control subjects performed significantly 
higher (p < .05) than the experimental subjects, collectively, and higher 
than group 2 specifically (p < .01). 

Comparison of the achievement of the three groups on the specific tasks 
on which training was given shows that all groups performed at essen- 
tially the same level on the first four of the tasks. 

Training on these tasks absorbed the major portion of the effort of the 
teachers in the experimental classes and relatively less effort within 
the remaining classes. Consequently, assuming an initially equivalent 
level of skill among the pupils in groups 2 and 3, it may be that the 
level of achievement necessary to reach criteria on the Bateman test 
could have been achieved with greater efficiency with the experimental 
curriculum. It is possible that the test itself fails to assess the 
scope of the educational attainment that has been realized with the cur- 
riculum, in which case a revision of the test would be indicated. 

The differences between groups 2 and 3 on the remaining two items that 
deal with skills included to some degree in the sutnner Head Start pro- 
ject were statistically significant, group 3 doing appreciably better on 
the polar opposites task and group 2 exceeding group 3 in the use of 
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prepositions. To attenpt an interpretation of the first difference 
would be hazardous, but the second can probably be related to the speci- 
ficity of the particular skill assessed. Relational aspects were 
covered in both curricula, but only the experiinental curriculum gave 
a high degree of consideration to the exact words required by the 
Bateman for the expression of relations. For instance, underneafe was 
not acceptable in place of under. The superiority of the experimental 
groups would have been greater had the teachers progressed beyond this 
point during the sunmer curriculum. 

The general superiority of group 3 over group 2 on the remaining items 
(p < .01), those on which the experimental groups did not receive summer 
training, could be attributed to either the effect of their own curricu- 
lum, which did in fact embrace these topics , or to a general superiority 
of the children in the control group. With respect to the former hypo- 
thesis, it is difficult to conclude other than that the efficiency of 
the curriculum, as it was presented to the summer students, leaves much 
to be desired, for greater gains could have been realized across the 
curriculum generally as it was used for the control subjects. With 
respect to the latter hypothesis, the difference between the two groups 
in the use of prepositions would have been even greater with coiaparable 
groups. The data suggest that general ability is no substitute for 
specific training when the objectives of Instruction are very specific. 

Tests of Vertical Structure . The sequence in which the tasks comprising 
the curriculum are presented is dictated by logical hierarchical order- 
ing, presumably proceeding from the relatively simple to the more com- 
plex. If in fact there is a latent organization that dictates such a 
hierarchy, there will be two restrictions on the statistical character- 
istics of the students* performance on the tasks. In the first place, 
the mean level of performance on the tasks should descend as the tasks 

ascend the hierarchy if there is any true variance among the students. 

The reason is that attainment on a lower task would be prerequisite to 

attainment at a higher level. While it would be possible to attain a 

lower level without attaining a higher level of functioning, it would be 
impossible to attain the higher level without having first attained the 
lower level. In this sense, then, the mean of a task at a lower level 
imposes the upper limit for a mean at a higher level. Any true variance 
among the subjects would be reflected in the variance of the means on 
the particular tasks, and since they can vary in only one direction. It 
would necessitate that the means descend at increasingly higher levels 
of the curricular. 

Since the reliability of the full scale, based upon internal consis- 
tency (KR-20), was .71, indicating that in fact there is some true 
variance among the subjects, it is legitimate to examine the progression 
of the means. 

It is necessary to disregard the first 5 tasks in this respect, for the 
training has had the effect of bringing all subjects to the same level. 
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also eliminated, since they are regarded as 



The last three Items were 
more discrete skills. 




Independently of their location In the scale. 



Since most of the discrepancies were associated with tx<ro specific Items, 
8a and 8c, these Items were eliminated from further analysis. Mong the 
remaining seven items, 15 of a possible 21 comparisons were c^flrmng 
of the hypothesis. TMs yields a of 3.85, significant at the .95 
level of confidence, but this confidence is attenuated by the fact that 
the two items were deliberately eliminated because they failed to con- 



Glven the required progression of the means among the remaining items, 
there is one additional characteristic required for vertical arrange- 
ment of the items. If attainment at a lower level is prerequisite to 
attainment at a higher level, then a judgment of adequacy at a higher 
level implies that a student is adequate at a lower level. 

In terms of a four fold table, the frequency of subjects passing the 
’‘higher*’ item but falling the ’’lower” item will be zero and the ordinary 
formula for a phi coefficient reduces toJ 



For the purpose of examining the intercorrelations actually obtained in 
comparison with those whose expectancy is developed above, the subjects 
were scored in a binary fashion, e.g., 1 for adequate and 0 for emerg- 
ing or inadequate. 

For those combinations of tasks where the mean for the ’lower” task was 
greater than the mean for the "higher” task, a condition necessary for 
the inference that the attainment of a "lower” skill was prerequisite 
for the attainment of the "higher,” the expected correlation was com- 
puted for comparison with the actual correlation# Table 2 shows the 
variables that were compared in this fashion, the expected correlations 
and the actual correlations. 

There is little correspondence between the two columns. Perhaps 
exceptions to this generalization are the correlations involving com- 
binations of 5a, 5b and 8b. Although these correlations are markedly 



form to expectation. 




, where X is X . 

y (X-x2)(Y-y2) 
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Table 2 



Correlation Expected on 
In Comparison 



the Basis or a Hierarchical Structure 
to Tliose Actually Obtained 



Variables 


Expected 


Ob talned 


3^ 


4 


.80 


.12 


3, 


6 


.39 


-.05 


4, 


6 


.48 


-.04 


5a, 


5b 


.81 


.56 


5a, 


6 


.13 


-.14 


5a, 


7 


.30 


.08 


5a, 


8b 


.57 


.28 


5b, 


6 


.19 


-.11 


5b, 


7 


.62 


.08 


5b, 


8b 


.71 


.40 



N-49 



attenuated, they are consistent In direction and general magnitude with 
the expectations developed above, and the correlations are significant 
(p < .05). The actual criteria imposed in the determination of adequacy 
can markedly affect the progression of the means, and the present 
analysis is presented xd-thin the context of the present scoring criteria. 

Analogue Task^. The analysis of the analogue tasks proceeded ^along^ 
Te^al lines. The tasks were separated into six sets, all o^ which, 
generally, were more similar among themselves than with the remaining 
tasks, and KR-20 estimates of reliability were obtained. The reliabi- 
lities in Table 3 are quite high, especially in light of the limited 
number of items associated with each sub-scale. There is also evidence 
of a fairly strong general factor reflected in the reliability o i 

total score (.86). 

Evidence of the benefit of the experience with the first set of analogue 
tasks is found in the fact that of the 30 items in the present ..orm, 
only 3 were subsequently discarded for lack of variance. The mean item 
difficulties indicate that in the main the items are discriminating 
quite well, and in this case there is a general decline of the means as 
the complexity of the reasoning is Increased. 

Examination of the tasks reveals that successful performance presupposes 
a knowledge of the names for the relevant stimulus d'xmensions, the color 
and shape in most instances. The variance in these 

accounts for at least part of the variance on the more difficult items. 
But the proportion of children responding correctly on the succeeding 
items is considerably lower than would be anticipated simply on the 
basis of success in the basic skills, especially 

made in respect to the specific color and shape used in the more dl * ^ 
cult task. It does appear, therefore, that an important characteristic 
is being tapped by the present series of tasks that 

for the assessment of some portions of the / 

other curriculum that holds the development of lo^^ical a 

an Important objective. 

Turning directly to the correspondence of the analogue tasks with the 
curriculum, performance on the analogue tasks can be comoared wl^ the 
tasks comprlLng the Bateman test. Certainly it would be hoped that tn. 
performance of subjects on the two Instruments xrould be “^related to 
Lme degree, thus affirming that "we are at least in the same ball pa.-.k 
with the analogue tasks. At the same time, the correlations ^hou.d not 
be too high, for the Bateman should be measuring a selection of ialrly 
soeclflc abilities that would not be coimion to the analogue tasks. The 
over-all correlation of .52 between the two Instruments confirms general 
expectations along these lines, but because of the global nature of the 
total scores, tends not to be particularly revealing. An alternative 
explanation of the correlation might be that both Instruments are simply 
mSsurlng a common general factor, "intelligence." 

variance might contribute to the correlation, the reliabilities ot th.- 
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Table 3 

Analysis of Analogue Tasks with Means, Standard Deviations, 
Mean Item Difficulties, and Estimates of Reliability 





Tasks 


No. of Items 


X 


SD 


P 


KR-20 


I 


Color Naming 


5 


3.85 


1.46 


.77 


.75 


II 


'Shape Naming 


3 


2.63 


.72 


CO 

00 

• 


.60 


III 


Simple Response 


6 


5.43 


1.01 


.90 


.46 


IV 


Conditioned Response 


3 


1.76 


.94 


.58 


OD 

CM 

• 


V 


Inference 


8 


3.88 


2*68 


.48 


.85 


VI 


Complex Inference 


2 


.78 


.54 


.39 


— 


T 


Total Scale 


27 


18.33 


3.23 


.68 


.86 



two scales would lead one to expect a correlation of .60 or hlRher. 

The actual correlation, then, supports the interpretation of specific 
factor variance, even though it is not significantly lower than the 
expected correlation. 

In comparing performance on the analogue tasks with performance on the 
Bateman, it is possible to look at the relationship with two types of 
criteria employed with the Bateman, the proportion of subjects judged 
ade^quate in their responses, and the proportion judged emerging. The 
intercor relations among items were first examined to see if they re- 
vealed any coherent order that would support the claim of legitimate^ 
relationships rather than fortuitous correspondence. In the main, this 
was confirmed. Comparisons Involving the first four Bateman tasks were 
precluded by the lack of variance on these items. Using the emerging 
criterion on Task 3 (polar opposites) of the Bateman, there X'?ere signif- 
icant correlations with the analogue tasks requiring negation ( . 29) , 
polar negation (.33), and generally consistent but low correlations with 
the set of 8 items requiring Inference (group V, the combined correla- 
tion being .24, which is significant only at the .10 level. Within the 
context of these data, the variable seems to be tapping several general 
characteristics when assessed at the emerging level, but no appreciable 
relationships were found when the other criteria ware employed. Evi- 
dently, the difference between "emerging” and "adequate" rests upon the 
attainment of some fairly specific skills, and the criteria should be 
reexamined. 

So few subjects passed the criteria for adequacy in the use of prepor 
sltious(Task 4) that there is a reasonable chance that the correlations 
obtained are simply fortuitous, but there was a marked correlation be- 
tween this variable and the first Inference task in the analogue series 
(.45 on the first trial, .44 on the second). The correlations of these 
variables also were significantly related to performance as judged by 
the criteria for emerging (.28 and .29 for the first and second trials, 
respectively). The correlations with the remaining inference items were 
positive but generally low. 

Task 5a of the Bateman (positive categories) correlated significantly 
with the shape-naming analogue tasks in this case, suggesting that shape- 
naming la a matter of classifying the stimuli on the basis of shape. 
Negative categories, Task 5b, on the other hand, was significantly 
related to the conditional response items (which Include negation) and 
to *the simplest Inference items among those on the analogue task. But 
these findings hold only with the scores obtained from the emerging 
criteria. 

The means for the subjects on Inference on the Bateman (Task 6) were 
extremely low, thus restricting the variance on these items. Even so, 
the only significant correlations were in association t7ith two of the 
Inference items on the analogue tasks. This finding suggests that there 
is some true covariance among the items, but this question should be 
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further investigation. A siinilar result occurred with Task 7, negative 
inference, but in this case the correlations were less consistent an 
were restricted to those involving the scores Indicating adequate 

responding. 

Few of the remaining Bateman tasks presented consistent patterns of 
correlation with the analogue tasks, with the marked exception of 
Task 9, color-naming, which was significantly correlated "^^^h most of 
the analogue tasks. This might be at first interpreted as evidence 
of a general factor that was evidenced in all of the tasks, but might 
more constructively be regarded as a reminder that nearly ev^ry ana ogue 
task Included color as a relevant dimension. Competency 
a general requirement for successful performance on the analogue tasks 
in-order to eliminate that source of variance in performance on the 
tasks involving inference. 

The correlations obtained provide some concurrent evaluation of the 
curriculum and indicate several tasks that might ue used for T^^^^oer 
evaluation of various portions of it. They fail, ^oweyer, to provide 
the unambiguous evaluation that has been consistently lacking in t^^ 
area of Investigation. Such evaluation might profitably be undertaken 
in a piece-meal fashion, studying only selected portions of a curriculum 
at a time. It is far too simple to be either too close to a curriculum, 

such that the evaluation task simply repeats the learning 

far away. The findings reported above may have served in a small degree 

to point the way. 
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